For this task, we are asked to find if bookmakers are good at deciding the over/under bets. I selected 5 bookmakers for that task. Those bookmakers are:
Pinnacle
Betsafe
Sportingbet
Tipico
WilliamHill
First of all, I read the data and calculated the necessary values by below code.
library(data.table)
library(anytime)
library(plotly)
matches<-data.table(readRDS("df9b1196-e3cf-4cc7-9159-f236fe738215_matches.RDS"))
odds<-data.table(readRDS("df9b1196-e3cf-4cc7-9159-f236fe738215_odd_details.RDS"))
matches=unique(matches)
#Converting the date from Unix
matches[,match_date:=anydate(date)]
matches[,match_time:=anytime(date)]
matches=matches[order(home,-match_time)]
matches[,c("match_date","date"):=NULL]
#Finding Out the Over Games
matches[,c("HomeGoals","AwayGoals"):=tstrsplit(score,':')]
matches$HomeGoals=as.numeric(matches$HomeGoals)
matches$AwayGoals=as.numeric(matches$AwayGoals)
matches[,TotalGoals:=HomeGoals+AwayGoals]
matches[,IsOver:=0]
matches[TotalGoals>2,IsOver:=1]
matches=matches[complete.cases(matches)]
#Finding the Year, Month, Date and Hour Information
matches[,Year:=year(match_time)]
matches[,Month:=month(match_time)]
matches[,Weekday:=wday(match_time)]
matches[,Hour:=hour(match_time)]
#Selecting the Over Under Bets with Total Handicap of 2.5
odds_ov_un=odds[betType=='ou' & totalhandicap=='2.5']
odds_ov_un[,totalhandicap:=NULL]
#Finding Out the Inital and Final Bets
odds_ov_un=odds_ov_un[order(matchId, oddtype,bookmaker,date)]
odds_ov_un_initial=odds_ov_un[,list(start_odd=odd[1]),
by=list(matchId,oddtype,bookmaker)]
odds_ov_un_final=odds_ov_un[,list(final_odd=odd[.N]),
by=list(matchId,oddtype,bookmaker)]
Then I found out the inital and final odds for my selected bookmakers.
pinnacle_over_under_initial=odds_ov_un_initial[bookmaker=='Pinnacle']
Betsafe_over_under_initial=odds_ov_un_initial[bookmaker=='Betsafe']
Sportingbet_over_under_initial=odds_ov_un_initial[bookmaker=='Sportingbet']
Tipico_over_under_initial=odds_ov_un_initial[bookmaker=='Tipico']
WilliamHill_over_under_initial=odds_ov_un_initial[bookmaker=='William Hill']
pinnacle_over_under_final=odds_ov_un_final[bookmaker=='Pinnacle']
Betsafe_over_under_final=odds_ov_un_final[bookmaker=='Betsafe']
Sportingbet_over_under_final=odds_ov_un_final[bookmaker=='Sportingbet']
Tipico_over_under_final=odds_ov_un_final[bookmaker=='Tipico']
WilliamHill_over_under_final=odds_ov_un_final[bookmaker=='William Hill']
pinnacle_wide_initial=dcast(pinnacle_over_under_initial,
matchId~oddtype,
value.var='start_odd')
Betsafe_wide_initial=dcast(Betsafe_over_under_initial,
matchId~oddtype,
value.var='start_odd')
Sportingbet_wide_initial=dcast(Sportingbet_over_under_initial,
matchId~oddtype,
value.var='start_odd')
Tipico_wide_initial=dcast(Tipico_over_under_initial,
matchId~oddtype,
value.var='start_odd')
WilliamHill_wide_initial=dcast(WilliamHill_over_under_initial,
matchId~oddtype,
value.var='start_odd')
pinnacle_wide_final=dcast(pinnacle_over_under_final,
matchId~oddtype,
value.var='final_odd')
Betsafe_wide_final=dcast(Betsafe_over_under_final,
matchId~oddtype,
value.var='final_odd')
Sportingbet_wide_final=dcast(Sportingbet_over_under_final,
matchId~oddtype,
value.var='final_odd')
Tipico_wide_final=dcast(Tipico_over_under_final,
matchId~oddtype,
value.var='final_odd')
WilliamHill_wide_final=dcast(WilliamHill_over_under_final,
matchId~oddtype,
value.var='final_odd')
I selected my bins with a difference of 0.1.
For Pinnacle;
#Pinnacle
##Initial
merged_matches=merge(matches,pinnacle_wide_initial,by='matchId')
merged_matches[,probOver:=1/over]
merged_matches[,probUnder:=1/under]
merged_matches[,totalProb:=probOver+probUnder]
merged_matches[,probOver:=probOver/totalProb]
merged_matches[,probUnder:=probUnder/totalProb]
merged_matches=merged_matches[complete.cases(merged_matches)]
merged_matches[,totalProb:=NULL]
cutpoints=c(seq(0,1,0.1))
merged_matches[,odd_cut_over:=cut(probOver,cutpoints)]
summary_table=merged_matches[,list(empirical_over=mean(IsOver),
probabilistic_over=mean(probOver),.N),
by=list(Year,odd_cut_over)]
summary_table=summary_table[order(Year)]
plot(summary_table[,list(empirical_over,probabilistic_over)],cex=4,main="Inital Probability Data")
abline(0,1,col='red')
Based on the plot for the initial odds, I can say that most of the points are on the x=y line, meaning that Pinnacle has a good record of estimating probabilities. We can also say that, for low over probabilities, most of the games did end up as under meaning that the bookmaker didn’t have to pay money to the betters. I run basically the same code for the final bets and for the other bookmakers, so I will move on with the plots instead.
Based on the graph for final odds, Tipico has actually adjusted their odds and we can see that more points now lie on the x=y line. Also, most of the games with low over probabilities did end up as under. So the adjustment was profitable for Tipico.
For Betsafe,
Based on the graph, I can say that if Betsafe kept with these odds, the company would have lost money. There some matches where the probability of being over is high, meaning that playing under pays better, and the matches ended up as under. Let’s see if the company adjusted these odds in the final probabilities graph.
According to the plot, some adjustments are made to the problematic matches. There are still some matches that are away from the empirical data, but overall the adjustment is better.
For Sportingbet,
Plot shows that, SportingBet is actually pretty spot on with the initial bets. Most of the points are around the x=y line. Lets see if the company made some adjustments to the outliers.
With some adjustments, the companys odds became more accurate. So far for all the companies, final odds are better at prediction than the initial odds. Lets see the last two companies.
For Tipico,
Most of the points are on the x=y line. There are no matches with high over probability that ended up as under. Let’s see if the adjustments made things worse.
This time the adjustment actually made things worse. Now, there are some matches that have a high over probability that ended up as under. If the company stuck with the inital bets, this wouldn’t have happened. This time the adjustment was a bad thing.
For William Hill,
Plot suggests that, William Hill is good at predicting odds. Let’s see the final odds graph.
After the adjustments, some matches with high over probabilities did end up as over. The company was better off without adjustment.
I decided to continue with William Hill. With the below code, I generated graphs for the probability information over the years.
merged_matches=merge(matches,WilliamHill_wide_initial,by='matchId')
merged_matches[,probOver:=1/over]
merged_matches[,probUnder:=1/under]
merged_matches[,totalProb:=probOver+probUnder]
merged_matches[,probOver:=probOver/totalProb]
merged_matches[,probUnder:=probUnder/totalProb]
merged_matches=merged_matches[complete.cases(merged_matches)]
merged_matches[,totalProb:=NULL]
cutpoints=c(seq(0,1,0.1))
merged_matches[,odd_cut_over:=cut(probOver,cutpoints)]
summary_table=merged_matches[,list(empirical_over=mean(IsOver),
probabilistic_over=mean(probOver),.N),
by=list(Year,odd_cut_over)]
summary_table=summary_table[order(Year)]
p1_initial <- plot_ly(summary_table, x = ~Year[odd_cut_over=="(0.4,0.5]"], y = ~empirical_over[odd_cut_over=="(0.4,0.5]"], name = 'Empirical for (0.4,0.5]', type = 'scatter', mode = 'lines')%>%
add_trace(y = ~probabilistic_over[odd_cut_over=="(0.4,0.5]"], name = 'Probabilistic for (0.4,0.5]', mode = 'lines') %>%
add_trace(y = ~empirical_over[odd_cut_over=="(0.5,0.6]"], name = 'Empirical for (0.5,0.6]', mode = 'lines') %>%
add_trace(y = ~probabilistic_over[odd_cut_over=="(0.5,0.6]"], name = 'Probabilistic for (0.5,0.6]', mode = 'lines') %>%
layout(xaxis = list(title="Years"),
yaxis = list(title="Probability",range = c(0, 1)))
p1_initial
p2_initial<-plot_ly(summary_table, x = ~Year[odd_cut_over=="(0.3,0.4]"], y = ~empirical_over[odd_cut_over=="(0.3,0.4]"], name = 'Empirical for (0.3,0.4]', type = 'scatter', mode = 'lines')%>%
add_trace(y = ~probabilistic_over[odd_cut_over=="(0.3,0.4]"], name = 'Probabilistic for (0.3,0.4]', mode = 'lines') %>%
layout(xaxis = list(title="Years"),
yaxis = list(title="Probability",range = c(0, 1)))
p2_initial
p3_initial<-plot_ly(summary_table, x = ~Year[odd_cut_over=="(0.7,0.8]"], y = ~empirical_over[odd_cut_over=="(0.7,0.8]"], name = 'Empirical for (0.7,0.8]', type = 'scatter', mode = 'lines')%>%
add_trace(y = ~probabilistic_over[odd_cut_over=="(0.7,0.8]"], name = 'Probabilistic for (0.7,0.8]', mode = 'lines') %>%
layout(xaxis = list(title="Years"),
yaxis = list(title="Probability",range = c(0, 1)))
p3_initial
Based on these plots, initial probabilities follow the empirical data closely for bins (0.5,0.6]) and (0.4,0.5]). Meaning that the company was better predicting the probabilities that suggests a 50/50 chance of over and under.
For final odds,
Again it can be seen that the company is better at predicting the probabilities in the middle. It can also be said for the final probabilities that matches with low over probabilities did end up as under. For this reason, we can say that the company is good at predicting the under results as well.
For this task, I selected the 12BET betting company. In the below code, I calculated the change in each odds (home win, away win and draw). I gathered all this information in a single data table called “merged_matches_change”.
matches<-data.table(readRDS("df9b1196-e3cf-4cc7-9159-f236fe738215_matches.RDS"))
odds<-data.table(readRDS("df9b1196-e3cf-4cc7-9159-f236fe738215_odd_details.RDS"))
matches=unique(matches)
matches[,c("HomeGoals","AwayGoals"):=tstrsplit(score,':')]
matches$HomeGoals=as.numeric(matches$HomeGoals)
matches$AwayGoals=as.numeric(matches$AwayGoals)
# x represents the games ended with a draw, 1 is for home win and 2 is for away win
matches[,homewin:="x"]
#Finding out about who won
matches[HomeGoals>AwayGoals,homewin:=1]
matches[HomeGoals<AwayGoals,homewin:=2]
matches=matches[complete.cases(matches)]
odds_1x2=odds[betType=='1x2' & bookmaker=='12BET']
odds_1x2=odds_1x2[order(matchId, oddtype,bookmaker,date)]
odds_1x2[,totalhandicap:=NULL]
#Calculating the Odd Changes
odds_1x2_change=odds_1x2[,list(odd_change=odd[.N]-odd[1]),
by=list(matchId,oddtype,bookmaker)]
odds_1x2_wide_change=dcast(odds_1x2_change,
matchId~oddtype,
value.var='odd_change')
merged_matches_change=merge(matches,odds_1x2_wide_change,by='matchId')
To visualize the relation between the odd changes and the match results, I utilized three box plots, one for each odd change. Box plot below shows the relation between home win odd change and results.
pchange1<-plot_ly(merged_matches_change, x = ~homewin, y = ~odd1, type='box') %>%
layout(title='Home Win Odd Change',
yaxis = list(title='Home Win Odd Change ',zeroline = FALSE),
xaxis = list(title='Results',zeroline = FALSE))
pchange1
According to the box plot, for away wins, home win odd is most of the time increased with than 75% of the change is above zero. This is sensible as a the game time approaches, more informtion becomes available; weather conditions, player conditions, team formations etc. This change information can be used for predicting match results. Other two box plots for away win and draw is given below.
This box plot is again sensible. For home wins, away win odds are increased until the match time. Draw odds are not increased as much as away wins.
An interesting take from this plot is that, bookmakers usually don’t make big changes to the draw odds. This can be because predicting a draw is harder than predicting a loss or a win. So, they play safe with draw odds.